Transcriptional regulatory sequence of carcinoembryonic antigen for expression targeting

ABSTRACT

The transcriptional regulatory sequence (TRS) of Carcinoembryonic Antigen (CEA) and molecular chimeras comprising the CEA TRS and DNA encoding a heterologous enzyme are described. Such molecular chimeras are capable of targeting the expression of the heterologous enzyme to CEA-positive cells. The heterologous enzyme may be cytosine deaminase.

This application is filed pursuant to 35 U.S.C. §371 as a U.S. NationalPhase Application of International Application No. PCT/GB94/02546 filedNov. 18, 1994 which is a Continuation In Part of Ser. No. 08/154,712filed Nov. 19, 1993.

TECHNICAL FIELD

The present invention relates to a transcriptional regulatory sequenceuseful in gene therapy.

BACKGROUND

Colorectal carcinoma (CRC) is the second most frequent cancer and thesecond leading cause of cancer-associated deaths in the U.S. and WesternEurope. The overall five-year survival rate for patients has notmeaningfully improved in the last three decades. Prognosis for the CRCcancer patient is associated with the depth of tumor penetration intothe bowel wall, the presence of regional lymph node involvement and,most importantly, the presence of distant metastases. The liver is themost common site for distant metastasis and, in approximately 30% ofpatients, the sole initial site of tumor recurrence after successfulresection of the primary colon cancer. Hepatic metastases are the mostcommon cause of death in the CRC cancer patient.

The treatment of choice for the majority of patients with hepatic CRCmetastasis is systemic or regional chemotherapy using 5-fluorouracil(5-FU) alone or in combination with other agents such as leviamasole.However, despite extensive effort, there is still no satisfactorytreatment for hepatic CRC metastasis. Systemic single- andcombination-agent chemotherapy and radiation are relatively ineffectiveemphasizing the need for new approaches and therapies for the treatmentof the diseases.

A gene therapy approach is being developed for primary and metastaticliver tumors that exploits the transcriptional differences betweennormal and metastatic cells. This approach involves linking thetranscriptional regulatory sequences (TRS) of a tumor associated markergene to the encoding sequence of an enzyme, typically a non-mammalianenzyme, to create an artificial chimaeric gene that is selectivelyexpressed in cancer cells. The enzyme should be capable of converting anon-toxic prodrug into a cytotoxic or cytostatic drug thereby allowingfor selective elimination of metastatic cells.

The principle of this approach has been demonstrated using analpha-fetoprotein/Varicella Zoster virus thymidine kinase chimaera totarget hepatocellular carcinoma with the enzyme metabolically activatingthe non-toxic prodrug 6-methoxypurine arabinonucleoside ultimatelyleading to formation of the cytoxic anabolite adenine arabinonucleosidetriphosphate (see Huber et al, Proc. Natl. Acad. Sci U.S.A., 88,8039-8043 (1991) and EP-A-0 415 731).

For the treatment of hepatic metastases of CRC, it is desirable tocontrol the expression of an enzyme with the transcriptional regulatorysequences of a tumor marker associated with such metastases.

CEA is a tumor associated marker that is regulated at thetranscriptional level and is expressed by most CRC tumors but is notexpressed in normal liver. CEA is widely used as an important diagnostictool for postoperative surveillance, chemotherapy efficacydeterminations, immunolocalisation and immunotherapy. The TRS of CEA arepotentially of value in the selective expression of an enzyme in CEA⁺tumor cells since there appears to be a very low heterogeneity of CEAwithin metastatic tumors, perhaps because CEA may have an importantfunctional role in metastasis.

The cloning of the CEA gene has been reported and the promoter localisedto a region of 424 nucleotides upstream from the translational start(Schrewe et al, Mol. Cell. Biol., 10, 2738-2748 (1990) but the full TRSwas not identified.

SUMMARY

In the work on which the present invention is based, CEA genomic cloneshave been identified and isolated from the human chromosome 19 genomiclibrary LL19NL01, ATCC number 57766, by standard techniques describedhereinafter. The CEA enhancers are especially advantageous for highlevel expression in CEA-positive cells and no expression in CEA-negativecells.

According to one aspect, the present invention provides a DNA moleculecomprising the CEA TRS but without associated CEA coding sequence.

According to another aspect, the present invention provides use of a CEATRS for and targeting expression of a heterologous enzyme to CEA⁺ cells.Preferably the enzyme is capable of catalysing the production of anagent cytotoxic or cytostatic to the CEA⁺ target cells.

As described in more detail hereinafter, the present invention havesequenced a large part of the CEA gene upstream of the coding sequence.As used herein, the term “CEA TRS” means any part of the CEA geneupstream of the coding sequence which has a transcriptional regulatoryeffect on a heterologous coding sequence operably linked thereto.

Certain parts of the sequence of the CEA gene upstream of the codingsequence have been identified as making significant contributions to thetranscriptional regulatory effect, more particularly increasing thelevel and/or selectivity of transcription. Preferably the CEA TRSincludes all or part of the region from about −299b to about +69b, morepreferably about −90b to about +69b. Increases in the level oftranscription and/or selectivity can also be obtained by including oneor more of the following regions: −14.5 kb to −10.6 kb, preferably −13.6kb to −10.6 kb, and/or −6.1 kb to −3.8 kb. All of the regions referredto above can be included in either orientation and in differentcombinations. In addition, repeats of these regions may be included,particularly repeats of the −90b to +69b region, containing for example2,3, 4 or more copies of the region. The base numbering refers to thesequence of FIG. 6. The regions referred to are included in the plasmidsdescribed in FIG. 5B.

DETAILED DESCRIPTION

Gene therapy involves the stable integration of new genes into targetcells and the expression of those genes, once they are in place, toalter the phenotype of that particular target cell (for review seeAnderson, W. F. Science 226, 401-409 (1984) and McCormick, D.Biotechnology 3, 689-693, (1985)). Gene therapy may be beneficial forthe treatment of genetic diseases that involves the replacement of onedefective or missing enzyme, such as; hypoxanthine-guaninephosphoribosyl transferase in Lesch-Nyhan diseases, purine nucleosidephosphorylase in severe immunodeficiency disease, and adenosinedeaminase in severed combined immunodeficiency diseases.

It has now been found that it is possible to selectively arrest thegrowth of, or kill, mammalian carcinoma cells with prodrugs, i.e.chemical agents capable of selective conversion to cytotoxic (causingcell death) or cytostatic (suppressing cell multiplication and growth)metabolites. This is achieved by the construction of a molecularchimaera comprising a “target tissue-specific” TRS that is selectivelyactivated in target cells, such as cancerous cells, and that controlsthe expression of a heterologous enzyme. This molecular chimaera may bemanipulated via suitable vectors and incorporated into an infectivevirion. Upon administration of an infective virion containing themolecular chimaera to a host (e.g., mammal or human), the enzyme isselectively expressed in the target cells. Administration of prodrugs(compounds that are selectively metabolised by the enzyme intometabolites that are either further metabolised to or are, in fact,cytotoxic or cytostatic agents) can then result in the production of thecytotoxic or cytostatic agent in situ in the cancer cell. According tothe present invention CEA TRS provides the target tissue specificity.

Molecular chimaeras (recombinant molecules comprised of unnaturalcombinations of genes or sections of genes), and infective virions(complete viral particles capable of infecting appropriate host cells)are well known in the art of molecular biology.

A number of enzyme prodrug combinations may be used for the abovepurpose, providing the enzyme is capable of selectively activating theadministered compound either directly or through an intermediate to acytostatic or cytotoxic metabolite. The choice of compound will alsodepend on the enzyme system used, but must be selectively metabolised bythe enzyme either directly or indirectly to a cytotoxic or cytostaticmetabolite. The term heterologous enzyme, as used herein, refers to anenzyme that is derived from or associated with a species which isdifferent from the host to be treated and which will display theappropriate characteristics of the above mentioned selectivity. Inaddition, it will also be appreciated that a heterologous enzyme mayalso refer to an enzyme that is derived from the host to be treated thathas been modified to have unique characteristics unnatural to the host.

The enzyme cytosine deaminase (CD) catalyses the deamination of cytosineto uracil. Cytosine deaminase is present in microbes and fungi butabsent in higher eukaryotes. This enzyme catalyses the hydrolyticdeamination of cytosine and 5-fluorocytosine (5-FC) to uracil and5-fluorouracil (5-FU), respectively. Since mammalian cells do notexpress significant amounts of cytosine deaminase, they are incapable ofconverting 5-FC to the toxic metabolite 5-FU and therefore5-fluorocytosine is nontoxic to mammalian cells at concentrations whichare effective for antimicrobial activity. 5-Fluorouracil is highly toxicto mammalian cells and is widely used as an anticancer agent.

In mammalian cells, some genes are ubiquitously expressed. Most genes,however, are expressed in a temporal and/or tissue-specific manner, orare activated in response to extracellular inducers. For example,certain genes are actively transcribed only at very precise times inontogeny in specific cell types, or in response to some inducingstimulus. This regulation is mediated in part by the interaction betweentranscriptional regulatory sequences (for example, promoter and enhancerregulatory DNA sequences), and sequence-specific, DNA-bindingtranscriptional protein factors.

It has now been found that it is possible to alter certain mammaliancells, e.g. colorectal carcinoma cells, metastatic colorectal carcinomacells and hepatic colorectal carcinoma cells to selectively express aheterologous enzyme as a hereinbefore defined, e.g. CD. This is achievedby the construction of molecular chimaeras in an expression cassette.

Expression cassettes themselves are well known in the art of molecularbiology. Such an expression cassette contains all essential DNAsequences required for expression of the heterologous enzyme in amammalian cell. For example, a preferred expression cassette willcontain a molecular chimaera containing the coding sequence for CD, anappropriate polyadenylation signal for a mammalian gene (i.e., apolyadenylation signal that will function in a mammalian cell), and CEAenhancers and promoter sequences in the correct orientation.

Normally, two DNA sequences are required for the complete and efficienttranscriptional regulation of genes that encode messenger RNAs inmammalian cells: promoters and enhancers. Promoters are locatedimmediately upstream (5′) from the start site of transcription. Promotersequences are required for accurate and efficient initiation oftranscription. Different gene-specific promoters reveal a common patternof organisation. A typical promoter includes an AT-rich region called aTATA box (which is located approximately 30 base pairs 5′ to thetranscription initiation start site) and one or more upstream promoterelements (UPEs). The UPEs are a principle target for the interactionwith sequence-specific nuclear transcriptional factors. The activity ofpromoter sequences is modulated by other sequences called enhancers. Theenhancer sequence may be a great distance from the promoter in either anupstream (5′) or downstream (3′) position. Hence, enhancers operate inan orientation- and position-independent manner. However, based onsimilar structural organisation and function that may be interchanged,the absolute distinction between promoters and enhancers is somewhatarbitrary. Enhancers increase the rate of transcription from thepromoter sequence. It is predominantly the interaction betweensequence-specific transcriptional factors with the UPE and enhancersequences that enable mammalian cells to achieve tissue-specific geneexpression. The presence of these transcriptional protein factors(tissue-specific, trans-activating factors) bound to the UPE andenhancers (cis-acting, regulatory sequences) enables other components ofthe transcriptional machinery, including RNA polymerase, to initiatetranscription with tissue-specific selectivity and accuracy.

The transcriptional regulatory sequence for CEA is suitable fortargeting expression in colorectal carcinoma, metastatic colorectalcarcinoma, and hepatic colorectal metastases, transformed cells of thegastrointestinal tract, lung, breast and other tissues. By placing theexpression of the gene encoding CD under the transcriptional control ofthe CRC-associated marker gene, CEA, the nontoxic compound, 5-FC, can bemetabolically activated to 5-FU selectively in CRC cells (for example,hepatic CRC cells). An advantage of this system is that the generatedtoxic compound, 5-fluorouracil, can diffuse out of the cell in which itwas generated and kill adjacent tumor cells which did not incorporatethe artificial gene for cytosine deaminase.

In the work on which the present invention is based, CEA genomic cloneswere identified and isolated from the human chromosome 19 genomiclibrary LL19NL01, ATCC number 57766, by standard techniques describedhereinafter. The cloned CEA sequences comprise CEA enhancers in additionto the CEA promoter. The CEA enhancers are especially advantageous forhigh level expression in CEA-positive cells and no expression inCEA-negative cells.

The present invention further provides a molecular chimaera comprising aCEA TRS and a DNA sequence operatively linked thereto encoding aheterologous enzyme, preferably an enzyme capable of catalysing theproduction of an agent cytotoxic or cytostatic to the CEA⁺ cells.

The present invention further provides a molecular chimaera comprising aDNA sequence containing the coding sequence of the gene that codes for aheterologous enzyme under the control of a CEA TRS in an expressioncassette.

The present invention further provides in a preferred embodiment amolecular chimaera comprising a CEA TRS which is operatively linked tothe coding sequence for the gene encoding a non-mammalian cytosinedeaminase (CD). The molecular chimaera comprises a promoter andadditionally comprises an enhancer.

In particular, the present invention provides a molecular chimaeracomprising a DNA sequence of the coding sequence of the gene coding forthe heterologous enzyme, which is preferably CD, additionally includingan appropriate polyadenylation sequence, which is linked downstream in a3′ position and in the proper orientation to a CEA TRS. Most preferablythe expression cassette also contains an enhancer sequence.

Preferably non-mammalian CD is selected from the group consisting ofbacterial, fungal, and yeast cytosine deaminase.

The molecular chimaera of the present invention may be made utilizingstandard recombinant DNA techniques.

Another aspect of the invention is the genomic CEA sequence as describedby Seq ID1.

The coding sequence of CD and a polyadenylation signal (for example seeSeq IDs 1 and 2) are placed in the proper 3′ orientation to theessential CEA transcriptional regulatory elements. This molecularchimaera enables the selective expression of CD in cells or tissue thatnormally express CEA. Expression of the CD gene in mammalian CRC andmetastatic CRC (hepatic colorectal carcinoma metastases) will enablenontoxic 5-FC to be selectively metabolised to cytotoxic 5-FU.

Accordingly, in a another aspect of the present invention, there isprovided a method of constructing a molecular chimaera comprisinglinking a DNA sequence encoding a heterologous enzyme gene, e.g. CD, toa CEA TRS.

In particular the present invention provides a method of constructing amolecular chimaera as herein defined, the method comprising ligating aDNA sequence encoding the coding sequence and polyadenylation signal ofthe gene for a heterologous enzyme (e.g. non-mammalian CD) to a CEA TRS(e.g., promoter sequence and enhancer sequence).

These molecular chimaeras can be delivered to the target tissue or cellsby a delivery system. For administration to a host (e.g., mammal orhuman), it is necessary to provide an efficient in vivo delivery systemthat stably incorporates the molecular chimaera into the cells. Knownmethods utilize techniques of calcium phosphate transfection,electroporation, microinjection, liposomal transfer, ballistic barrage,DNA viral infection or retroviral infection. For a review of thissubject see Biotechniques 6, No.7, (1988).

The technique of retroviral infection of cells to integrate artificialgenes employs retroviral shuttle vectors which are known in the art(Miller A. D., Buttimore C. Mol. Cell. Biol. 6, 2985-2902 (1986).Essentially, retroviral shuttle vectors (retroviruses comprisingmolecular chimaeras used to deliver and stably integrate the molecularchimaera into the genome of the target cell) are generated using the DNAform of the retrovirus contained in a plasmid. These plasmids alsocontain sequences necessary for selection and growth in bacteria.Retroviral shuttle vectors are constructed using standard molecularbiology techniques well known in the art. Retroviral shuttle vectorshave the parental endogenous retroviral genes (e.g., gag, pol and env)removed from the vectors and the DNA sequence of interest is inserted,such as the molecular chimaeras that have been described. The vectorsalso contain appropriate retroviral regulatory sequences for viralencapsidation, proviral insertion into the target genome, messagesplicing, termination and polyadenylation. Retroviral shuttle vectorshave been derived from the Moloney murine leukaemia virus (Mo-MLV) butit will be appreciated that other retroviruses can be used such as theclosely related Moloney murine sarcoma virus. Other DNA viruses may alsoprove to be useful as delivery systems. The bovine papilloma virus (BPV)replicates extrachromosomally, so that delivery systems based on BPVhave the advantage that the delivered gene is maintained in anonintegrated manner.

Thus according to a further aspect of the present invention there isprovided a retroviral shuttle vector comprising the molecular chimaerasas hereinbefore defined.

The advantages of a retroviral-mediated gene transfer system are thehigh efficiency of the gene delivery to the targeted tissue or cells,sequence specific integration regarding the viral genome (at the 5′ and3′ long terminal repeat (LTR) sequences) and little rearrangements ofdelivered DNA compared to other DNA delivery systems.

Accordingly in a preferred embodiment of the present invention there isprovided a retroviral shuttle vector comprising a DNA sequencecomprising a 5′ viral LTR sequence, a cis-acting psi-encapsidationsequence, a molecular chimaera as hereinbefore defined and a 3′ viralLTR sequence.

In a preferred embodiment, and to help eliminate non-tissue-specificexpression of the molecular chimaera, the molecular chimaera is placedin opposite transcriptional orientation to the 5′ retroviral LTR. Inaddition, a dominant selectable marker gene may also be included that istranscriptionally driven from the 5′ LTR sequence. Such a dominantselectable marker gene may be the bacterial neomycin-resistance gene NEO(aminoglycoside 3′ phosphotransferase type II), which confers oneukaryotic cells resistance to the neomycin analogue Geneticin(antibiotic G418 sulphate; registered trademark of GIBCO). The NEO geneaids in the selection of packaging cells that contain these sequences.

The retroviral vector is preferably based on the Moloney murineleukaemia virus but it will be appreciated that other vectors may beused. Vectors containing a NEO gene as a selectable marker have beendescribed, for example, the N2 vector (Eglitis M. A., Kantoff P., GilboaE., Anderson W. F. Science 230, 1395-1398 (1985)).

A theoretical problem associated with retroviral shuttle vectors is thepotential of retroviral long terminal repeat (LTR) regulatory sequencestranscriptionally activating a cellular oncogene at the site ofintegration in the host genome. This problem may be diminished bycreating SIN vectors. SIN vectors are self-inactivating vectors thatcontain a deletion comprising the promoter and enhancer regions in theretroviral LTR. The LTR sequences of SIN vectors do nottranscriptionally activate 5′ or 3′ genomic sequences. Thetranscriptional inactivation of the viral LTR sequences diminishesinsertional activation of adjacent target cell DNA sequences and alsoaids in the selected expression of the delivered molecular chimaera. SINvectors are created by removal of approximately 299 bp in the 3′ viralLTR sequence (Gilboa E., Eglitis P. A., Kantoff P. W., Anderson W. F.Biotechniques 4, 504-512 (1986)).

Thus preferably the retroviral shuttle vectors of the present inventionare SIN vectors.

Since the parental retroviral gag, pol, and env genes have been removedfrom these shuttle vectors, a helper virus system may be utilised toprovide the gag, pol, and env retroviral gene products in trans topackage or encapsidate the retroviral vector into an infective virion.This is accomplished by utilising specialised “packaging” cell lines,which are capable of generating infectious, synthetic virus yet aredeficient in the ability to produce any detectable wild-type virus. Inthis way the artificial synthetic virus contains a chimaera of thepresent invention packaged into synthetic artificial infectious virionsfree of wild-type helper virus. This is based on the fact that thehelper virus that is stably integrated into the packaging cell containsthe viral structural genes, but is lacking the psi-site, a cis-actingregulatory sequence which must be contained in the viral genomic RNAmolecular for it to be encapsidated into an infectious viral particle.

Accordingly, in a still further aspect of the present invention, thereis provided an infective virion comprising a retroviral shuttle vector,as hereinbefore described, said vector being encapsidated within viralproteins to create an artificial, infective, replication-defective,retrovirus.

In a another aspect of the present invention there is provided a methodfor producing infective virions of the present invention by deliveringthe artificial retroviral shuttle vector comprising a molecular chimaeraof the invention, as hereinbefore described, into a packaging cell line.

The packaging cell line may have stably integrated within it a helpervirus lacking a psi-site and other regulatory sequence, as hereinbeforedescribed, or, alternatively, the packaging cell line may be engineeredso as to contain helper virus structural genes within its genome. Inaddition to removal of the psi-site, additional alternations can be madeto the helper virus LTR regulatory sequences to ensure that the helpervirus is not packaged in virions and is blocked at the level of reversetranscription and viral integration. Alternatively, helper virusstructural genes (i.e., gag, pol, and env) may be individually andindependently transferred into the packaging line. Since these viralstructural genes are separated within the packaging cell's genome, thereis little chance of convert recombinations generating wild-type virus.

The present invention also provides a packaging cell line comprising aninfective virion, as described hereinbefore, said virion furthercomprising a retroviral shuttle vector.

The present invention further provides for a packaging cell linecomprising a retroviral shuttle vector as described hereinbefore.

In addition to retroviral-mediated gene delivery of the chimeric,artificial, therapeutic gene, other gene delivery systems known to thoseskilled in the art can be used in accordance with the present invention.These other gene delivery systems include other viral gene deliverysystems known in the art, such as the adenovirus delivery systems.

Non-viral delivery systems can be utilized in accordance with thepresent invention as well. For example, liposomal delivery systems candeliver the therapeutic gene to the tumor site via a liposome. Liposomescan be modified to evade metabolism and/or to have distinct targetingmechanisms associated with them. For example, liposomes which haveantibodies incorporated into their structure, such as antibodies to CEA,can have targeting ability to CEA-positive cells. This will increaseboth the selectivity of the present invention as well as its ability totreat disseminated disease (metastasis).

Another gene delivery system which can be utilized according to thepresent invention is receptor-mediated delivery, wherein the gene ofchoice is incorporated into a ligand which recognizes a specific cellreceptor. This system can also deliver the gene to a specific cell type.Additional modifications can be made to this receptor-mediated deliverysystem, such as incorporation of adenovirus components to the gene sothat the gene is not degraded by the cellular lysosomal compartmentafter internalization by the receptor.

The infective virion or the packaging cell line according to theinvention may be formulated by techniques well known in the art and maybe presented as a formulation (composition) with a pharmaceuticallyacceptable carrier therefor. Pharmaceutically acceptable carriers, inthis instance physiologic aqueous solutions, may comprise liquid mediumsuitable for use as vehicles to introduce the infective virion into ahost. An example of such a carrier is saline. The infective virion orpackaging cell line may be a solution or suspension in such a vehicle.Stabilizers and antioxidants and/or other excipients may also be presentin such pharmaceutical formulations (compositions), which may beadministered to a mammal by any conventional method (e.g., oral orparenteral routes). In particular, the infective virion may beadministered by intra-venous or intra-arterial infusion. In the case oftreating hepatic metastatic CRC, intra-hepatic arterial infusion may beadvantageous. The packaging cell line can be administered directly tothe tumor or near the tumor and thereby produce infective virionsdirectly at or near the tumor site.

Accordingly, the present invention provides a pharmaceutical formulation(composition) comprising an infective virion or packaging cell lineaccording to the invention in admixture with a pharmaceuticallyacceptable carrier.

Additionally, the present invention provides methods of makingpharmaceutical formulations (compositions), as herein described,comprising mixing an artificial infective virion, containing a molecularchimaera according to the invention as described hereinbefore, with apharmaceutically acceptable carrier.

The present invention also provides methods of making pharmaceuticalformulations (compositions), as herein described, comprising mixing apackaging cell line, containing an infective virion according to theinvention as described hereinbefore, with a pharmaceutically acceptablecarrier.

Although any suitable compound that can be selectively converted to acytotoxic or cytotostatic metabolite by the enzyme cytosine deaminasemay be utilised, the preferred compound for use according to theinvention is 5-FC, in particular for use in treating colorectalcarcinoma (CRC), metastatic colorectal carcinoma, or hepatic CRCmetastases. 5-FC, which is non-toxic and is used as an antifungal, isconverted by CD into the established cancer therapeutic 5-FU.

Any agent that can protentiate the antitumor effects of 5-FU can alsoprotentiate the antitumor effects of 5-FC since, when used according tothe present invention, 5-FC is selectively converted to 5-FU. Accordingto another aspect of the present invention, agents such as leucovorinand levemisol, which can potentiate the antitumor effects of 5-FU, canalso be used in combination with 5-FC when 5-FC is used according to thepresent invention. Other agents which can potentiate the antitumoreffects of 5-FU are agents which block the metabolism 5-FU. Examples ofsuch agents are 5-substituted uracil derivatives, for example,5-ethynyluracil and 5-bromvinyluracil (PCT/GB91/01650 (WO 92/04901);Cancer Research 46, 1094, (1986) which are incorporated herein byreference in their entirety). Therefore, a further aspect of the presentinvention is the use of an agent which can potentiate the antitumoreffects of 5-FU, for example, a 5-substituted uracil derivative such as5-ethynyluracil or 5-bromvinyluracil in combination with 5-FC when 5-FCis used according to the present invention. The present inventionfurther includes the use of agents which are metabolised in vivo to thecorresponding 5-substituted uracil derivatives described hereinbefore(see Biochemical Pharmacology 38, 2885, (1989) which is incorporatedherein by reference in its entirety) in combination with 5-FC when 5-FCis used according to the present invention.

5-FC is readily available (e.g., United States Biochemical, Sigma) andwell known in the art. Leucovorin and levemisol are also readilyavailable and well known in the art.

Two significant advantages of the enzyme/prodrug combination of cytosinedeaminase/5-fluorocytosine and further aspects of the invention are thefollowing:

1. The metabolic conversion of 5-FC by CD produces 5-FC which is thedrug of choice in the treatment of many different types of cancers, suchas colorectal carcinoma.

2. The 5-FU that is selectively produced in one cancer cell can diffuseout of that cell and be taken up by both non-facilitated diffusion andfacilitated diffusion into adjacent cells. This produces a neighbouringcell killing effect. This neighbour cell killing effect alleviates thenecessity for delivery of the therapeutic molecular chimera to everytumor cell. Rather, delivery of the molecular chimera to a certainpercentage of tumor cells can produce the complete eradication of alltumor cells.

The amounts and precise regimen in treating a mammal will of course bethe responsibility of the attendant physician, and will depend on anumber of factors including the type of severity of the condition to betreated. However, for hepatic metastatic CRC, an intrahepatic arterialinfusion of the artificial infective virion at a titer of between 2×10⁵and 2×10⁷ colony forming units per ml (CFU/ml) infective virions issuitable for a typical tumour. Total amount of virions infused will bedependent on tumour size and are preferably given in divided doses.

Likewise, the packaging cell line is administered directly to a tumourin an amount of between 2×10⁵ and 2×10⁷ cells. Total amount of packagingcell line infused will be dependent on tumour size and is preferablygiven in divided doses.

Prodrug treatment—Subsequent to infection with the infective virion,certain cytosine compounds (prodrugs of 5-FU) are converted by CD tocytoxic or cytostatic metabolites (e.g. 5-FC is converted to 5-FU) intarget cells. The above mentioned prodrug compounds are administered tothe host (e.g. mammal or human) between six hours and ten days,preferably between one and five days, after administration of theinfective virion.

The dose of 5-FC to be given will advantageously be in the range 10 to500 mg kg body weight of recipient per day, preferably 50 to 500 mg perkg bodyweight of recipient per day, more preferably 50 to 250 mg per kgbodyweight of recipient per day, and most preferably 50 to 150 mg per kgbody weight of recipient per day. The mode of administration of 5-FC inhumans are well known to those skilled in the art. Oral administrationand/or constant intravenous infusion of 5-FC anticipated by the instantinvention to be preferable.

The doses and mode of administration of leucovorin and levemisol to beused in accordance with the present invention are well known or readilydetermined by those clinicians skilled in the art of oncology.

The dose and mode of administration of the 5-substituted uracilderivatives can be determined by the skilled oncologist. Preferably,these derivatives are given by intravenous injection or orally at a doseof between 0.01 to 50 mg per kg body weight of the recipient per day,particularly 0.01 to 10 mg per kg body weight per day, and morepreferably 0.01 to 0.4 mg per kg bodyweight per day depending on thederivative used. An alternative preferred administration regime is 0.5to 10 mg per kg body weight of recipient once per week.

The following examples serve to illustrate the present invention butshould not be construed as a limitation thereof. In the Examplesreference is made to the figures a brief description of which is asfollows:

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1: Diagram of CEA phage clones. The overlapping clones lambdaCEA1,lambdaCEA7, and lambdaCEA5 represent an approximately 26 kb region ofCEA genomic sequence. The 11,288 pb HindIII-Sau3A fragment that wassequenced is represented by the heavy line under lambdaCEA1. The 3774 bpHindIII-HindIII fragment that was sequenced is represented by the heavyline under lambdaCEA7. The bent arrows represent the transcription startpoint for CEA mRNA. The straight arrows represent the oligonucleotidesCR15 and CR16. H, HindIII; S. SstI; B, BamHI; E, EcoRI; X, XbaI.

FIG. 2: Restriction map of part of lambdaCEA1. The arrow head representsthe approximate location of the transcription imitation point for CEAmRNA. Lines below the map represent the CEA inserts of pBS+ subclones.These subclones are convenient sources for numerous CEA restrictionfragments.

FIG. 3: Mapplot of 15,056 bp HindIII to Sau3A fragment from CEA genomicDNA showing consensus sequences. Schematic representation of some of theconsensus sequences found in the CEA sequence of Seq IDs 1 and 2. Theconsensus sequences shown here are from the transcriptional dictionaryof Locker and Buzard (DNA Sequence 1, 3-11 (1980)). The lysozymalsilencer is coded B18. The last line represents 90% homology to thetopoisomerase II cleavage consensus.

FIG. 4: Cloning scheme for CEA constructs extending from −299 bp to +69bp.

FIG. 5A: Cloning scheme for CEA constructs extending from −10.7 kb to+69 kb.

FIG. 5B: Coordinates for CEA sequence present in several CEA/luciferaseclones. CEA sequences were cloned into the multiple cloning region ofpGL2-Basic (Promega Corp.) by standard techniques.

FIGS. 5C and 5D: Transient luciferase assays. Transient transfectionsand luciferase assays were performed in quadruplicate by standardtechniques using DOTAP (Boehringer Mannheim, Indianapolis, Ind.,U.S.A.), luciferase assay system (Promega, Madison, Wis., U.S.A.), andDynatech luminometer (Chantilly, Va., U.S.A.). CEA-positive cell linesincluded LoVo (ATCC #CCL 229) and SW1463 (ATCC #CCL 234). CEA-negativecells lines included HuH7 and Hep3B (ATCC #HB 8064). C. Luciferaseactivity expressed as the percent of pGL2-Control plasmid activity. D.Luciferase activities of LoVo and SW1463 expressed as fold increase overactivity in Hep3B.

FIG. 6: CEA genomic sequence from −14463 to +592, comprising SEQ ID NO:1and SEQ ID NO:2.

EXAMPLE 1 Construction of Transcriptional Regulatory Sequence ofCarcinoembryonic Antigen/Cytosine Deaminase Molecular Chimaera

A) Cloning and isolation of the transcriptional regulatory sequence ofthe carcinoembryonic antigen gene

CEA genomic clones were identified and isolated from the humanchromosome 19 genomic library LL19NL01, ATCC #57766, by standardtechniques (Richards et al., Cancer Research, 50, 1521-1527 (1990) whichis herein incorporated by reference in its entirety). The CEA cloneswere identified by plaque hybridization to ³²P end-labelledoligonucleotides CR15 and CR16. CR15,5′-CCCTGTGATCTCCAGGACAGCTCAGTCTC-3′ (SEQ ID NO: 3), and CR16,5′-GTTTCCTGAGTGATGTCTGTGTGCAATG-3′ (SEQ ID NO: 4), hybridize to a 5′non-transcribed region of CEA that has little homology to other membersof the CEA gene family. Phage DNA was isolated from three clones thathybridized to both oligonucleotide probes. Polymerase chain reaction,restriction mapping, and DNA sequence analysis confirmed that the threeclones contained CEA genomic sequences. The three clones are designatedlambdaCEA1, lambdaCEA5, and lambdaCEA7 and have inserts of approximately13.5, 16.2, and 16.7 kb respectively. A partial restriction map of thethree overlapping clones is shown in FIG. 1.

Clone lambdaCEA1 was initially chosen for extensive analysis. Fragmentsisolated from lambdaCEA1 were subcloned using standard techniques intothe plasmid pBS+ (Stratagene Cloning Systems, La Jolla, Calif., U.S.A.)to facilitate sequencing, site-directed mutagenesis, and construction ofchimeric genes. The inserts of some clones are represented in FIG. 2.The complete DNA sequence of a 11,288 bp HindIII/Sau3A restrictionfragment from lambdaCEA1 SEQ ID NO: 1) was determined by the dideoxysequencing method using the dsDNA Cycle Sequencing System from LifeTechnologies, Inc. and multiple oligonucleotide primers. This sequenceextends from −10.7 kb to +0.6 kb relative to the start site of CEA mRNA.The sequence of 3774 base pair HindIII restriction fragment fromlambdaCEA1 was also determined (SEQ ID NO: 2). This sequence extendsfrom −14.5 kb to −10.7 kb relative to the start site of CEA mRNA. ThisHindIII fragment is present in plasmid pCR145.

To determine important transcriptional regulatory sequences variousfragments of CEA genomic DNA are linked to a reporter gene such asluciferase or chloramphenicol acetyltransferase. Various fragments ofCEA genomic DNA are tested to determine the optimized, cell-typespecific TRS that results in high level reporter gene expression inCEA-positive cells but not in CEA-negative cells. The various reporterconstructs, along with appropriate controls, are transfected into tissueculture cell lines that express high, low, or no CEA. The reporter geneanalysis identifies both positive and negative transcriptionalregulatory sequences. The optimized CEA-specific TRS is identifiedthrough the reporter gene analysis and is used to specifically directthe expression of any desired linked coding sequence, such as CD or VZVTK, in cancerous cells that express CEA. The optimized CEA-specific TRS,as used herein, refers to any DNA construct that directs suitably highlevels of expression in CEA positive cells and low or no expression inCEA-negative cells. The optimized CEA-specific TRS consists of one orseveral different fragments of CEA genomic sequence or multimers ofselected sequences that are linked together by standard recombinant DNAtechniques. It will be appreciated by those skilled in the art that theoptimized CEA-specific TRS may also include some sequences that are notderived from the CEA genomic sequences shown in Seq IDs 1 and 2. Theseother sequences may include sequences from adjoining regions of the CEAlocus, such as sequences from the introns, or sequences further upstreamor downstream from the sequenced DNA shown in Seq IDs 1 and 2, or theycould include transcriptional control elements from other genes thatwhen linked to selected CEA sequences result in the desired CEA-specificregulation.

The CEA sequence of Seq IDs 1 and 2 (FIG. 6) were computer analyzed forcharacterized consensus sequences which have been associated with generegulation. Currently not enough is known about transcriptionalregulatory sequences to accurately predict by sequence alone whether asequence will be functional. However, computer searches forcharacterized consensus sequences can help identify transcriptionalregulatory sequences in uncharacterized sequences since many enhancersand promoters consist of unique combinations and spatial alignments ofseveral characterized consensus sequences as well as other sequences.Since not all transcriptional regulatory sequences have been identifiedand not all sequences that are identical to characterized consensussequences are functional, such a computer analysis can only suggestpossible regions of DNA that may be functionally important for generegulation.

Some examples of the consensus sequences that are present in the CEAsequence are shown in FIG. 3. Four copies of a lysozymal silencerconsensus sequences have been found in the CEA sequence. Inclusion ofone or more copies of this consensus sequence in the molecular chimeracan help optimize CEA-specific expression. A cluster of topoisomerase IIcleavage consensus identified approximately 4-5 kb upstream of the CEAtranscriptional start suggest that this region of CEA sequence maycontain important transcriptional regulatory signals that may helpoptimize CEA-specific expression.

The first fragment of CEA genomic sequence analyzed for transcriptionalactivity extends from −299 to +69, but it is appreciated by thoseskilled in the art that other fragments are tested in order to isolate aTRS that directs strong expression in CEA-positive cells but littleexpression in CEA-negative cells. As diagrammed in FIG. 4 the 943 bpSmaI-HindIII fragment of plasmid 39-5-5 was subcloned into theSmaI-HindIII sites of vector pBS+ (Stategene Cloning Systems) creatingplasmid 96-11. Single-stranded DNA was rescued from cultures of XL1-blue96-11 using an M13 helper virus by standard techniques. OligonucleotideCR70, 5′-CCTGGAACTCAAGCTTGAATTCTCCACAGAGGAGG-3′ (SEQ ID NO: 5), was usedas a primer for oligonucleotide-directed mutagenesis to introduceHindIII and EcoRI restriction sites at +65. Clone 109-3 was isolatedfrom the mutagenesis reaction and was verified by restriction and DNAsequence analysis to contain the desired changes in the DNA sequence.CEA genomic sequences −299 to +69, original numbering FIG. 3, wereisolated from 109-3 as a 381 bp EcoRI/HindIII fragment. Plasmid pRc/CMV(Invitrogen Corporation, San Diego, Calif., U.S.A.) was restricted withAatII and HindIII and the 4.5 kb fragment was isolated from low meltingpoint agarose by standard techniques. The 4.5 kb fragment of pRc/CMV wasligated to the 381 bp fragment of 109-3 using T4 DNA ligase. During thisligation the compatible HindIII ends of the two different restrictionfragments were ligated. Subsequently the ligation reaction wassupplemented with the four deoxynucleotides, dATP, dCTP, dGTP, and dTTP,and T4 DNA polymerase in order to blunt the non-compatible AatII andEcoRI ends. After incubating, phenol extracting, and ethanolprecipitating the reaction, the DNAs were again incubated with T4 DNAligase. The resulting plasmid, pCR92, allows the insertion of anydesired coding sequence into the unique HindIII site downstream of theCEA TRS, upstream from a polyadenylation site and linked to a dominantselectable marker. The coding sequence for CD or other desirableeffector or reporter gene, when inserted in the correct orientation intothe HindIII site, are transcriptionally regulated by the CEA sequencesand are preferably expressed in cells that express CEA but not in cellsthat do not express CEA.

In order to determine the optimized CEA TRS other reporter geneconstructs containing various fragments of CEA genomic sequences aremade by standard techniques from DNA isolated from any of the CEAgenomic clones (FIGS. 1, 2, 4, and 5). DNA fragments extending from theHindIII site introduced at position +65 (original numbering FIG. 3A) andnumerous different upstream sites are isolated and cloned into theunique HindIII site in plasmid p5VOALdelta5′ (De Wet, J. R., et al. Mol.Cell. Biol., 7, 725-737 (1987) which is herein incorporated by referencein its entirety) or any similar reporter gene plasmid to constructluciferase reporter gene constructs, FIGS. 4 and 5. These and similarconstructs are used in transient expression assays performed in severalCEA-positive and CEA-negative cell lines to determine a strong,CEA-positive cell-type specific TRS. FIGS. 5B, 5C, and 5D show theresults obtained from several CEA/luciferase reporter constructs. Theoptimized TRS is used to regulate the expression of CD or otherdesirable gene in a cell-type specific pattern in order to be able tospecifically kill cancer cells. The desirable expression cassette isadded to a retroviral shuttle vector to aid in delivery of theexpression cassette to cancerous tissue.

Strains containing plasmids 39-5-5 and 39-5-2 were deposited at the ATCCunder the Budapest Treaty with Accession No. 68904 and 68905,respectively. A strain containing plasmid pCR92 was deposited with theATCC under the Budapest Treaty with Accession No. 68914. A straincontaining plasmid pCR145 was deposited at the ATCC under the BudapestTreaty with Accession No. 69460.

B) Cloning and isolation of the E. coli gene encoding cytosine deaminase(CD)

The cloning, sequencing and expression of E. coli CD has already beenpublished (Austin & Huber, Molecular Pharmacology, 43, 380-387 (1993)the disclosure of which is incorporated herein by reference). A positivegenetic selection was designed for the cloning of the codA gene from E.coli. The selection took advantage of the fact that E. coli is only ableto metabolize cytosine via CD. Based on this, an E. coli strain wasconstructed that could only utilize cytosine as a pyrimidine source whencytosine deaminase was provided in trans. This strain, BA101, contains adeletion of the codAB operon and a mutation in the pyrF gene. The strainwas created by transducing a pyrF mutation (obtained from the E. colistrain X82 (E. coli Genetic Stock Center, New Haven, Conn., U.S.A.))into the strain MBM7007 (W. Dallas, Burroughs Wellcome Co., NC, U.S.A.)which carried a deletion of the chromosome from lac to argF. The pyrFmutation confers a pyrimidine requirement on the strain, BA101. Inaddition, the strain is unable to metabolize cytosine due to the codABdeletion. Thus, BA101 is able to grow on minimal medium supplementedwith uracil but is unable to utilize cytosine as the sole pyrimidinesource.

The construction of BA101 provided a means for positive selection of DNAfragments encoding. The strain, BA101, was transformed with plasmidscarrying inserts from the E. coli chromosome and the transformants wereselected for growth on minimal medium supplemented with cytosine. Usingthis approach, the transformants were screened for the ability tometabolize cytosine indicating the presence of a DNA fragment encodingCD. Several sources of DNA could be used for the cloning of the codAgene: 1) a library of the E. coli chromosome could be purchasedcommercially (for example from Clontech, Palo Alto, Calif., U.S.A. orStratagene, La Jolla, Calif., U.S.A.) and screened; 2) chromosomal DNAcould be isolated from E. coli, digested with various restrictionenzymes and ligated and plasmid DNA with compatible ends beforescreening; and/or 3) bacteriophage lambda clones containing mapped E.coli chromosomal DNA inserts could be screened.

Bacteriophage lambda clones (Y. Kohara, National Institute of Genetics,Japan) containing DNA inserts spanning the 6-8 minute region of the E.coli chromosome were screened for the ability to provide transientcomplementation of the codA defect. Two clones, 137 and 138 wereidentified in this manner. Large-scale preparations of DNA from theseclones were isolated from 500 ml cultures. Restriction enzymes were usedto generate DNA fragments ranging in size from 10-12 kilobases. Theenzymes used were EcoRI, EcoRI and BamHI, and EcoRI and HindIII. DNAfragments of the desired size were isolated from preparative agarosegels by electroelution. The isolated fragments were ligated to pBR322(Gibco BRL, Gaithersburg, Md., U.S.A.) with compatible ends. Theresulting ligation reactions were used to transfer the E. coli strain,DH5α (Gibco BRL, Gaithersburg, Md., U.S.A.). This step was used toamplify the recombinant plasmids resulting from the ligation reactions.The plasmid DNA preparations isolated from the ampicillin-resistant DH5αtransformants were digested with the appropriate restriction enzymes toverify the presence of insert DNA. The isolated plasmid DNA was used totransform BA101. The transformed cells were selected for resistance toampicillin and for the ability to metabolize cytosine. Two clones wereisolated pEA001 and pEA002. The plasmid pEA001 contains an approximately10.8 kb EcoRI-BamHI insert while pEA002 contains an approximately 11.5kb EcoRI-HindIII insert. The isolated plasmids were used to transformBA101 to ensure that the ability to metabolize cytosine was the resultof the plasmid and not due to a spontaneous chromosomal mutation.

A physical map of the pEA001 DNA insert was generated using restrictionenzymes. Deletion derivatives of pEA001 were constructed based on thisrestriction map. The resulting plasmids were screened for the ability toallow BA101 to metabolize cytosine. Using this approach, the codA genewas localized to a 4.8 kb EcoRI-BglII fragment. The presence of codAwithin these inserts was verified by enzymatic assays for CD activity.In addition, cell extracts prepared for enzymatic assay were alsoexamined by polyacrylamide gel electrophoresis. Cell extracts that werepositive for enzymatic activity also had a protein band migrating withan apparent molecular weight of 52,000.

The DNA sequence of both strands was determined for a 1634 bp fragment.The sequence determination began at the PstI site and extended to PvuIIsite thus including the codA coding domain. An open reading frame of1283 nucleotides was identified. The thirty amino terminal amino acidswere confirmed by protein sequencing. Additional internal amino acidsequences were generated from CNBr-digestion of gel-purified CD.

A 200 bp PstI fragment was isolated that spanned the translational startcodon of codA. This fragment was cloned into pBS⁺. Single-stranded DNAwas isolated from 30 ml culture and mutanized using a customoligonucleotide BA22 purchased from Synthecell Inc., Rockville, Md.,U.S.A. and the oligonucleotide-directed mutagenesis kit (Amersham,Arlington Heights, Ill., U.S.A.). The base changes result in theintroduction of an HindIII restriction enzyme site for joining of CDwith CEA TRS and in a translational start codon of ATG rather than GTG.The resulting 90 bp HindIII-PstI fragment is isolated and ligated withthe remainder of the cytosine deaminase gene. The chimeric CEATRS/cytosine deaminase gene is created by ligating the HindIII-PvuIIcytosine deaminase-containing DNA fragment with the CEA TRS sequences.

The strain BA101 and the plasmids, pEA001 and pEA003, were depositedwith ATCC under the Budapest Treaty with Accession Nos. 55299, 68916,and 68915 respectively.

C) Construction of transcriptional regulatory sequence ofcarcinoembryonic antigen/cytosine deaminase molecular chimera

A 1508 bp HindIII/PvuII fragment containing the coding sequence forcytosine deaminase is isolated from the plasmid containing the fulllength CD gene of Example 1B that has been altered to contain a HindIIIrestriction site just 5′ of the initiation codon. Plasmid pCR92 containsCEA sequences −299 to +69 immediately 5′ to a unique HindIII restrictionsite and a polyadenylation signal 3′ to a unique ApaI restriction site(Example 1A, FIG. 4). pCR92 is linearised with ApaI, the ends areblunted using dNTPs and T4 DNA polymerase, and subsequently digestedwith HindIII. The pCR92 HindIII/ApaI fragment is ligated to the 1508 bpHindIII/PvuII fragment containing cytosine deaminase. PlasmidpCEA-1/codA, containing CD inserted in the appropriate orientationrelative to the CEA TRS and polyadenylation signal is identified byrestriction enzyme and DNA sequence analysis.

The optimized CEA-specific TRS, the coding sequence for CD with an ATGtranslation start, and a suitable polyadenylation signal are joinedtogether using standard molecular biology techniques. The resultingplasmid, containing CD inserted in the appropriate orientation relativeto the optimized CEA specific TRS and a polyadenylation signal isidentified by restriction enzyme and DNA sequence analysis.

EXAMPLE 2 Construction of a Retroviral Shuttle Vector ConstructContaining the Molecular Chimera of Example 1

The retroviral shuttle vector pL-CEA-1/codA is constructed by ligating asuitable restriction fragment containing the optimized CEA TRS/codAmolecular chimera including the polyadenylation signal into anappropriate retroviral shuttle vector, such as N2(XM5) linearised at theXhoI site, using standard molecular biology techniques. The retroviralshuttle vector pL-CEA-1/codA is characterized by restrictionendonuclease mapping and partial DNA sequencing.

EXAMPLE 3 Virus Production of Retroviral Constructs of Example 3

The retroviral shuttle construct described in Example 2 is placed intoan appropriate packaging cell line, such as PA317, by electroporation orinfection. Drug resistant colonies, such as those resistant to G418 whenusing shuttle vectors containing the NEO gene, are single cell cloned bythe limiting dilution method, analyzed by Southern blots, and titred inNIH 3T3 cells to identify the highest producer of full-length virus.

EXAMPLE 4 Further Data on the CEA TRS

In addition to the plasmids shown in FIG. 5B, the following combinationsof regions have proved particularly advantageous at high levelexpression of the reporter gene in the system described in Example 1A:

pCR177:

(−14.5 kb to −10.6 kb)+(−6.1 kb to −3.9 kb)+(−299b to +69b)

pCR176:

(−13.6 kb to −10.6 kb)+(−6.1 kb to −3.9 kb)+(−299b to +69b)

pCR165:

(−3.9 kb to −6.1 kb)+(4x−90b to +69b)

pCR168:

(−13.6 kb to −10.6 kb)+(4x−90b to +69b).

25 1 11288 DNA Homo sapiens 1 aagcttaaaa cccaatggat tgacaacatcaagagttgga acaagtggac atggagatgt 60 tacttgtgga aatttagatg tgttcagctatcgggcagga gaatctgtgt caaattccag 120 catggttcag aagaatcaaa aagtgtcacagtccaaatgt cgaacagtgc aggggataaa 180 actgtggtgc attcaaactg agggatattttggaacatga gaaaggaagg gattgctgct 240 gcacagaaca tggatgatct cacacatagagttgaaagaa aggagtcaat cgcagaatag 300 aaaatgatca ctaattccac ctctataaagtttccaagag gaaaacccaa ttctgctgct 360 agagatcaga atggaggtga cctgtgccttgcaatggctg tgagggtcac gggagtgtca 420 cttagtgcag gcaatgtgcc gtatcttaatctgggcaggg ctttcatgag cacataggaa 480 tgcagacatt actgctgtgt tcattttacttcaccggaaa agaagaataa aatcagccgg 540 gcgcggtggc tcacgcctgt aatcccagcactttagaagg ctgaggtggg cagattactt 600 gaggtcagga gttcaagacc accctggccaatatggtgaa accccggctc tactaaaaat 660 acaaaaatta gctgggcatg gtggtgcgcgcctgtaatcc cagctactcg ggaggctgag 720 gctggacaat tgcttggacc caggaagcagaggttgcagt gagccaagat tgtgccactg 780 cactccagct tgggcaacag agccagactctgtaaaaaaa aaaaaaaaaa aaaaaaaaag 840 aaagaaagaa aaagaaaaga aagtataaaatctctttggg ttaacaaaaa aagatccaca 900 aaacaaacac cagctcttat caaacttacacaactctgcc agagaacagg aaacacaaat 960 actcattaac tcacttttgt ggcaataaaaccttcatgtc aaaaggagac caggacacaa 1020 tgaggaagta aaactgcagg ccctacttgggtgcagagag ggaaaatcca caaataaaac 1080 attaccagaa ggagctaaga tttactgcattgagttcatt ccccaggtat gcaaggtgat 1140 tttaacacct gaaaatcaat cattgcctttactacataga cagattagct agaaaaaaat 1200 tacaactagc agaacagaag caatttggccttcctaaaat tccacatcat atcatcatga 1260 tggagacagt gcagacgcca atgacaataaaaagagggac ctccgtcacc cggtaaacat 1320 gtccacacag ctccagcaag cacccgtcttcccagtgaat cactgtaacc tcccctttaa 1380 tcagccccag gcaaggctgc ctgcgatggccacacaggct ccaacccgtg ggcctcaacc 1440 tcccgcagag gctctccttt ggccaccccatggggagagc atgaggacag ggcagagccc 1500 tctgatgccc acacatggca ggagctgacgccagagccat gggggctgga gagcagagct 1560 gctggggtca gagcttcctg aggacacccaggcctaaggg aaggcagctc cctggatggg 1620 ggcaaccagg ctccgggctc caacctcagagcccgcatgg gaggagccag cactctaggc 1680 ctttcctagg gtgactctga ggggaccctgacacgacagg atcgctgaat gcacccgaga 1740 tgaaggggcc accacgggac cctgctctcgtggcagatca ggagagagtg ggacaccatg 1800 ccaggccccc atggcatggc tgcgactgacccaggccact cccctgcatg catcagcctc 1860 ggtaagtcac atgaccaagc ccaggaccaatgtggaagga aggaaacagc atccccttta 1920 gtgatggaac ccaaggtcag tgcaaagagaggccatgagc agttaggaag ggtggtccaa 1980 cctacagcac aaaccatcat ctatcataagtagaagccct gctccatgac ccctgcattt 2040 aaataaacgt ttgttaaatg agtcaaattccctcaccatg agagctcacc tgtgtgtagg 2100 cccatcacac acacaaacac acacacacacacacacacac acacacacac acacagggaa 2160 agtgcaggat cctggacagc accaggcaggcttcacaggc agagcaaaca gcgtgaatga 2220 cccatgcagt gccctgggcc ccatcagctcagagaccctg tgagggctga gatggggcta 2280 ggcaggggag agacttagag agggtggggcctccagggag ggggctgcag ggagctgggt 2340 actgccctcc agggaggggg ctgcagggagctgggtactg ccctccaggg agggggctgc 2400 agggagctgg gtactgccct ccagggagggggctgcaggg agctgggtac tgccctccag 2460 ggagggggct gcagggagct gggtactgccctccagggag gcaggagcac tgttcccaac 2520 agagagcaca tcttcctgca gcagctgcacagacacagga gcccccatga ctgccctggg 2580 ccagggtgtg gattccaaat ttcgtgccccattgggtggg acggaggttg accgtgacat 2640 ccaaggggca tctgtgattc caaacttaaactactgtgcc tacaaaatag gaaataaccc 2700 tactttttct actatctcaa attccctaagcacaagctag caccctttaa atcaggaagt 2760 tcagtcactc ctggggtcct cccatgcccccagtctgact tgcaggtgca cagggtggct 2820 gacatctgtc cttgctcctc ctcttggctcaactgccgcc cctcctgggg gtgactgatg 2880 gtcaggacaa gggatcctag agctggccccatgattgaca ggaaggcagg acttggcctc 2940 cattctgaag actaggggtg tcaagagagctgggcatccc acagagctgc acaagatgac 3000 gcggacagag ggtgacacag ggctcagggcttcagacggg tcgggaggct cagctgagag 3060 ttcagggaca gacctgagga gcctcagtgggaaaagaagc actgaagtgg gaagttctgg 3120 aatgttctgg acaagcctga gtgctctaaggaaatgctcc caccccgatg tagcctgcag 3180 cactggacgg tctgtgtacc tccccgctgcccatcctctc acagcccccg cctctaggga 3240 cacaactcct gccctaacat gcatctttcctgtctcattc cacacaaaag ggcctctggg 3300 gtccctgttc tgcattgcaa ggagtggaggtcacgttccc acagaccacc cagcaacagg 3360 gtcctatgga ggtgcggtca ggaggatcacacgtcccccc atgcccaggg gactgactct 3420 gggggtgatg gattggcctg gaggccactggtcccctctg tccctgaggg gaatctgcac 3480 cctggaggct gccacatccc tcctgattctttcagctgag ggcccttctt gaaatcccag 3540 ggaggactca acccccactg ggaaaggcccagtgtggacg gttccacagc agcccagcta 3600 aggcccttgg acacagatcc tgagtgagagaacctttagg gacacaggtg cacggccatg 3660 tccccagtgc ccacacagag caggggcatctggaccctga gtgtgtagct cccgcgactg 3720 aacccagccc ttccccaatg acgtgacccctggggtggct ccaggtctcc agtccatgcc 3780 accaaaatct ccagattgag ggtcctcccttgagtccctg atgcctgtcc aggagctgcc 3840 ccctgagcaa atctagagtg cagagggctgggattgtggc agtaaaagca gccacatttg 3900 tctcaggaag gaaagggagg acatgagctccaggaagggc gatggcgtcc tctagtgggc 3960 gcctcctgtt aatgagcaaa aaggggccaggagagttgag agatcagggc tggccttgga 4020 ctaaggctca gatggagagg actgaggtgcaaagaggggg ctgaagtagg ggagtggtcg 4080 ggagagatgg gaggagcagg taaggggaagccccagggag gccgggggag ggtacagcag 4140 agctctccac tcctcagcat tgacatttggggtggtcgtg ctagtggggt tctgtaagtt 4200 gtagggtgtt cagcaccatc tggggactctacccactaaa tgccagcagg actccctccc 4260 caagctctaa caaccaacaa tgtctccagactttccaaat gtcccctgga gagcaaaatt 4320 gcttctggca gaatcactga tctacgtcagtctctaaaag tgactcatca gcgaaatcct 4380 tcacctcttg ggagaagaat cacaagtgtgagaggggtag aaactgcaga cttcaaaatc 4440 tttccaaaag agttttactt aatcagcagtttgatgtccc aggagaagat acatttagag 4500 tgtttagagt tgatgccaca tggctgcctgtacctcacag caggagcaga gtgggttttc 4560 caagggcctg taaccacaac tggaatgacactcactgggt tacattacaa agtggaatgt 4620 ggggaattct gtagactttg ggaagggaaatgtatgacgt gagcccacag cctaaggcag 4680 tggacagtcc actttgaggc tctcaccatctaggagacat ctcagccatg aacatagcca 4740 catctgtcat tagaaaacat gttttattaagaggaaaaat ctaggctaga agtgctttat 4800 gctctttttt ctctttatgt tcaaattcatatacttttag atcattcctt aaagaagaat 4860 ctatccccct aagtaaatgt tatcactgactggatagtgt tggtgtctca ctcccaaccc 4920 ctgtgtggtg acagtgccct gcttccccagccctgggccc tctctgattc ctgagagctt 4980 tgggtgctcc ttcattagga ggaagagaggaagggtgttt ttaatattct caccattcac 5040 ccatccacct cttagacact gggaagaatcagttgcccac tcttggattt gatcctcgaa 5100 ttaatgacct ctatttctgt cccttgtccatttcaacaat gtgacaggcc taagaggtgc 5160 cttctccatg tgatttttga ggagaaggttctcaagataa gttttctcac acctctttga 5220 attacctcca cctgtgtccc catcaccattaccagcagca tttggaccct ttttctgtta 5280 gtcagatgct ttccacctct tgagggtgtatactgtatgc tctctacaca ggaatatgca 5340 gaggaaatag aaaaagggaa atcgcattactattcagaga gaagaagacc tttatgtgaa 5400 tgaatgagag tctaaaatcc taagagagcccatataaaat tattaccagt gctaaaacta 5460 caaaagttac actaacagta aactagaataataaaacatg catcacagtt gctggtaaag 5520 ctaaatcaga tatttttttc ttagaaaaagcattccatgt gtgttgcagt gatgacagga 5580 gtgcccttca gtcaatatgc tgcctgtaatttttgttccc tggcagaatg tattgtcttt 5640 tctcccttta aatcttaaat gcaaaactaaaggcagctcc tgggccccct ccccaaagtc 5700 agctgcctgc aaccagcccc acgaagagcagaggcctgag cttccctggt caaaataggg 5760 ggctagggag cttaaccttg ctcgataaagctgtgttccc agaatgtcgc tcctgttccc 5820 aggggcacca gcctggaggg tggtgagcctcactggtggc ctgatgctta ccttgtgccc 5880 tcacaccagt ggtcactgga accttgaacacttggctgtc gcccggatct gcagatgtca 5940 agaacttctg gaagtcaaat tactgcccacttctccaggg cagatacctg tgaacatcca 6000 aaaccatgcc acagaaccct gcctggggtctacaacacat atggactgtg agcaccaagt 6060 ccagccctga atctgtgacc acctgccaagatgcccctaa ctgggatcca ccaatcactg 6120 cacatggcag gcagcgaggc ttggaggtgcttcgccacaa ggcagcccca atttgctggg 6180 agtttcttgg cacctggtag tggtgaggagccttgggacc ctcaggatta ctccccttaa 6240 gcatagtggg gacccttctg catccccagcaggtgccccg ctcttcagag cctctctctc 6300 tgaggtttac ccagacccct gcaccaatgagaccatgctg aagcctcaga gagagagatg 6360 gagctttgac caggagccgc tcttccttgagggccagggc agggaaagca ggaggcagca 6420 ccaggagtgg gaacaccagt gtctaagcccctgatgagaa cagggtggtc tctcccatat 6480 gcccatacca ggcctgtgaa cagaatcctccttctgcagt gacaatgtct gagaggacga 6540 catgtttccc agcctaacgt gcagccatgcccatctaccc actgcctact gcaggacagc 6600 accaacccag gagctgggaa gctgggagaagacatggaat acccatggct tctcaccttc 6660 ctccagtcca gtgggcacca tttatgcctaggacacccac ctgccggccc caggctctta 6720 agagttaggt cacctaggtg cctctgggaggccgaggcag gagaattgct tgaacccggg 6780 aggcagaggt tgcagtgagc cgagatcacaccactgcact ccagcctggg tgacagaatg 6840 agactctgtc tcaaaaaaaa agagaaagatagcatcagtg gctaccaagg gctaggggca 6900 ggggaaggtg gagagttaat gattaatagtatgaagtttc tatgtgagat gatgaaaatg 6960 ttctggaaaa aaaaatatag tggtgaggatgtagaatatt gtgaatataa ttaacggcat 7020 ttaattgtac acttaacatg attaatgtggcatattttat cttatgtatt tgactacatc 7080 caagaaacac tgggagaggg aaagcccaccatgtaaaata cacccaccct aatcagatag 7140 tcctcattgt acccaggtac aggcccctcatgacctgcac aggaataact aaggatttaa 7200 ggacatgagg cttcccagcc aactgcaggtgcacaacata aatgtatctg caaacagact 7260 gagagtaaag ctgggggcac aaacctcagcactgccagga cacacaccct tctcgtggat 7320 tctgacttta tctgacccgg cccactgtccagatcttgtt gtgggattgg gacaagggag 7380 gtcataaagc ctgtccccag ggcactctgtgtgagcacac gagacctccc caccccccca 7440 ccgttaggtc tccacacata gatctgaccattaggcattg tgaggaggac tctagcgcgg 7500 gctcagggat cacaccagag aatcaggtacagagaggaag acggggctcg aggagctgat 7560 ggatgacaca gagcagggtt cctgcagtccacaggtccag ctcaccctgg tgtaggtgcc 7620 ccatccccct gatccaggca tccctgacacagctccctcc cggagcctcc tcccaggtga 7680 cacatcaggg tccctcactc aagctgtccagagagggcag caccttggac agcgcccacc 7740 ccacttcact cttcctccct cacagggctcagggctcagg gctcaagtct cagaacaaat 7800 ggcagaggcc agtgagccca gagatggtgacagggcaatg atccaggggc agctgcctga 7860 aacgggagca ggtgaagcca cagatgggagaagatggttc aggaagaaaa atccaggaat 7920 gggcaggaga ggagaggagg acacaggctctgtggggctg cagcccagga tgggactaag 7980 tgtgaagaca tctcagcagg tgaggccaggtcccatgaac agagaagcag ctcccacctc 8040 ccctgatgca cggacacaca gagtgtgtggtgctgtgccc ccagagtcgg gctctcctgt 8100 tctggtcccc agggagtgag aagtgaggttgacttgtccc tgctcctctc tgctacccca 8160 acattcacct tctcctcatg cccctctctctcaaatatga tttggatcta tgtccccgcc 8220 caaatctcat gtcaaattgt aaaccccaatgttggaggtg gggccttgtg agaagtgatt 8280 ggataatgcg ggtggatttt ctgctttgatgctgtttctg tgatagagat ctcacatgat 8340 ctggttgttt aaaagtgtgt agcacctctcccctctctct ctctctctct tactcatgct 8400 ctgccatgta agacgttcct gtttccccttcaccgtccag aatgattgta agttttctga 8460 ggcctcccca ggagcagaag ccactatgcttcctgtacaa ctgcagaatg atgagcgaat 8520 taaacctctt ttctttataa attacccagtctcaggtatt tctttatagc aatgcgagga 8580 cagactaata caatcttcta ctcccagatccccgcacacg cttagcccca gacatcactg 8640 cccctgggag catgcacagc gcagcctcctgccgacaaaa gcaaagtcac aaaaggtgac 8700 aaaaatctgc atttggggac atctgattgtgaaagaggga ggacagtaca cttgtagcca 8760 cagagactgg ggctcaccga gctgaaacctggtagcactt tggcataaca tgtgcatgac 8820 ccgtgttcaa tgtctagaga tcagtgttgagtaaaacagc ctggtctggg gccgctgctg 8880 tccccacttc cctcctgtcc accagagggcggcagagttc ctcccaccct ggagcctccc 8940 caggggctgc tgacctccct cagccgggcccacagcccag cagggtccac cctcacccgg 9000 gtcacctcgg cccacgtcct cctcgccctccgagctcctc acacggactc tgtcagctcc 9060 tccctgcagc ctatcggccg cccacctgaggcttgtcggc cgcccacttg aggcctgtcg 9120 gctgccctct gcaggcagct cctgtcccctacaccccctc cttccccggg ctcagctgaa 9180 agggcgtctc ccagggcagc tccctgtgatctccaggaca gctcagtctc tcacaggctc 9240 cgacgccccc tatgctgtca cctcacagccctgtcattac cattaactcc tcagtcccat 9300 gaagttcact gagcgcctgt ctcccggttacaggaaaact ctgtgacagg gaccacgtct 9360 gtcctgctct ctgtggaatc ccagggcccagcccagtgcc tgacacggaa cagatgctcc 9420 ataaatactg gttaaatgtg tgggagatctctaaaaagaa gcatatcacc tccgtgtggc 9480 ccccagcagt cagagtctgt tccatgtggacacaggggca ctggcaccag catgggagga 9540 ggccagcaag tgcccgcggc tgccccaggaatgaggcctc aacccccaga gcttcagaag 9600 ggaggacaga ggcctgcagg gaatagatcctccggcctga ccctgcagcc taatccagag 9660 ttcagggtca gctcacacca cgtcgaccctggtcagcatc cctagggcag ttccagacaa 9720 ggccggaggt ctcctcttgc cctccagggggtgacattgc acacagacat cactcaggaa 9780 acggattccc ctggacagga acctggctttgctaaggaag tggaggtgga gcctggtttc 9840 catcccttgc tccaacagac ccttctgatctctcccacat acctgctctg ttcctttctg 9900 ggtcctatga ggaccctgtt ctgccaggggtccctgtgca actccagact ccctcctggt 9960 accaccatgg ggaaggtggg gtgatcacaggacagtcagc ctcgcagaga cagagaccac 10020 ccaggactgt cagggagaac atggacaggccctgagccgc agctcagcca acagacacgg 10080 agagggaggg tccccctgga gccttccccaaggacagcag agcccagagt cacccacctc 10140 cctccaccac agtcctctct ttccaggacacacaagacac ctccccctcc acatgcagga 10200 tctggggact cctgagacct ctgggcctgggtctccatcc ctgggtcagt ggcggggttg 10260 gtggtactgg agacagaggg ctggtccctccccagccacc acccagtgag cctttttcta 10320 gcccccagag ccacctctgt caccttcctgttgggcatca tcccaccttc ccagagccct 10380 ggagagcatg gggagacccg ggaccctgctgggtttctct gtcacaaagg aaaataatcc 10440 ccctggtgtg acagacccaa ggacagaacacagcagaggt cagcactggg gaagacaggt 10500 tgtcctccca ggggatgggg gtccatccaccttgccgaaa agatttgtct gaggaactga 10560 aaatagaagg gaaaaaagag gagggacaaaagaggcagaa atgagagggg aggggacaga 10620 ggacacctga ataaagacca cacccatgacccacgtgatg ctgagaagta ctcctgccct 10680 aggaagagac tcagggcaga gggaggaaggacagcagacc agacagtcac agcagccttg 10740 acaaaacgtt cctggaactc aagctcttctccacagagga ggacagagca gacagcagag 10800 accatggagt ctccctcggc ccctccccacagatggtgca tcccctggca gaggctcctg 10860 ctcacaggtg aagggaggac aacctgggagagggtgggag gagggagctg gggtctcctg 10920 ggtaggacag ggctgtgaga cggacagagggctcctgttg gagcctgaat agggaagagg 10980 acatcagaga gggacaggag tcacaccagaaaaatcaaat tgaactggaa ttggaaaggg 11040 gcaggaaaac ctcaagagtt ctattttcctagttaattgt cactggccac tacgttttta 11100 aaaatcataa taactgcatc agatgacactttaaataaaa acataaccag ggcatgaaac 11160 actgtcctca tccgcctacc gcggacattggaaaataagc cccaggctgt ggagggccct 11220 gggaaccctc atgaactcat ccacaggaatctgcagcctg tcccaggcac tggggtgcaa 11280 ccaagatc 11288 2 3774 DNA Homosapiens 2 aagcttttta gtgctttaga cagtgagctg gtctgtctaa cccaagtgacctgggctcca 60 tactcagccc cagaagtgaa gggtgaagct gggtggagcc aaaccaggcaagcctaccct 120 cagggctccc agtggcctga gaaccattgg acccaggacc cattacttctagggtaagga 180 aggtacaaac accagatcca accatggtct ggggggacag ctgtcaaatgcctaaaaata 240 tacctgggag aggagcaggc aaactatcac tgccccaggt tctctgaacagaaacagagg 300 ggcaacccaa agtccaaatc caggtgagca ggtgcaccaa atgcccagagatatgacgag 360 gcaagaagtg aaggaaccac ccctgcatca aatgttttgc atgggaaggagaagggggtt 420 gctcatgttc ccaatccagg agaatgcatt tgggatctgc cttcttctcactccttggtt 480 agcaagacta agcaaccagg actctggatt tggggaaaga cgtttatttgtggaggccag 540 tgatgacaat cccacgaggg cctaggtgaa gagggcagga aggctcgagacactggggac 600 tgagtgaaaa ccacacccat gatctgcacc acccatggat gctccttcattgctcacctt 660 tctgttgata tcagatggcc ccattttctg taccttcaca gaaggacacaggctagggtc 720 tgtgcatggc cttcatcccc ggggccatgt gaggacagca ggtgggaaagatcatgggtc 780 ctcctgggtc ctgcagggcc agaacattca tcacccatac tgacctcctagatgggaatg 840 gcttccctgg ggctgggcca acggggcctg ggcaggggag aaaggacgtcaggggacagg 900 gaggaagggt catcgagacc cagcctggaa ggttcttgtc tctgaccatccaggatttac 960 ttccctgcat ctacctttgg tcattttccc tcagcaatga ccagctctgcttcctgatct 1020 cagcctccca ccctggacac agcaccccag tccctggccc ggctgcatccacccaatacc 1080 ctgataaccc aggacccatt acttctaggg taaggagggt ccaggagacagaagctgagg 1140 aaaggtctga agaagtcaca tctgtcctgg ccagagggga aaaaccatcagatgctgaac 1200 caggagaatg ttgacccagg aaagggaccg aggacccaag aaaggagtcagaccaccagg 1260 gtttgcctga gaggaaggat caaggccccg agggaaagca gggctggctgcatgtgcagg 1320 acactggtgg ggcatatgtg tcttagattc tccctgaatt cagtgtccctgccatggcca 1380 gactctctac tcaggcctgg acatgctgaa ataggacaat ggccttgtcctctctcccca 1440 ccatttggca agagacataa aggacattcc aggacatgcc ttcctgggaggtccaggttc 1500 tctgtctcac acctcaggga ctgtagttac tgcatcagcc atggtaggtgctgatctcac 1560 ccagcctgtc caggcccttc cactctccac tttgtgacca tgtccaggaccacccctcag 1620 atcctgagcc tgcaaatacc cccttgctgg gtgggtggat tcagtaaacagtgagctcct 1680 atccagcccc cagagccacc tctgtcacct tcctgctggg catcatcccaccttcacaag 1740 cactaaagag catggggaga cctggctagc tgggtttctg catcacaaagaaaataatcc 1800 cccaggttcg gattcccagg gctctgtatg tggagctgac agacctgaggccaggagata 1860 gcagaggtca gccctaggga gggtgggtca tccacccagg ggacaggggtgcaccagcct 1920 tgctactgaa agggcctccc caggacagcg ccatcagccc tgcctgagagctttgctaaa 1980 cagcagtcag aggaggccat ggcagtggct gagctcctgc tccaggccccaacagaccag 2040 accaacagca caatgcagtc cttccccaac gtcacaggtc accaaagggaaactgaggtg 2100 ctacctaacc ttagagccat caggggagat aacagcccaa tttcccaaacaggccagttt 2160 caatcccatg acaatgacct ctctgctctc attcttccca aaataggacgctgattctcc 2220 cccaccatgg atttctccct tgtcccggga gccttttctg ccccctatgatctgggcact 2280 cctgacacac acctcctctc tggtgacata tcagggtccc tcactgtcaagcagtccaga 2340 aaggacagaa ccttggacag cgcccatctc agcttcaccc ttcctccttcacagggttca 2400 gggcaaagaa taaatggcag aggccagtga gcccagagat ggtgacaggcagtgacccag 2460 gggcagatgc ctggagcagg agctggcggg gccacaggga gaaggtgatgcaggaaggga 2520 aacccagaaa tgggcaggaa aggaggacac aggctctgtg gggctgcagcccagggttgg 2580 actatgagtg tgaagccatc tcagcaagta aggccaggtc ccatgaacaagagtgggagc 2640 acgtggcttc ctgctctgta tatggggtgg gggattccat gccccatagaaccagatggc 2700 cggggttcag atggagaagg agcaggacag gggatcccca ggataggaggaccccagtgt 2760 ccccacccag gcaggtgact gatgaatggg catgcagggt cctcctgggctgggctctcc 2820 ctttgtccct caggattcct tgaaggaaca tccggaagcc gaccacatctacctggtggg 2880 ttctggggag tccatgtaaa gccaggagct tgtgttgcta ggaggggtcatggcatgtgc 2940 tgggggcacc aaagagagaa acctgagggc aggcaggacc tggtctgaggaggcatggga 3000 gcccagatgg ggagatggat gtcaggaaag gctgccccat cagggagggtgatagcaatg 3060 gggggtctgt gggagtgggc acgtgggatt ccctgggctc tgccaagttccctcccatag 3120 tcacaacctg gggacactgc ccatgaaggg gcgcctttgc ccagccagatgctgctggtt 3180 ctgcccatcc actaccctct ctgctccagc cactctgggt ctttctccagatgccctgga 3240 cagccctggc ctgggcctgt cccctgagag gtgttgggag aagctgagtctctggggaca 3300 ctctcatcag agtctgaaag gcacatcagg aaacatccct ggtctccaggactaggcaat 3360 gaggaaaggg ccccagctcc tccctttgcc actgagaggg tcgaccctgggtggccacag 3420 tgacttctgc gtctgtccca tgcaccctga aaccacaaca aaaccccagccccagaccct 3480 gcaggtacaa tacatgtggg gacagtctgt acccagggga agccagttctctcttcctag 3540 gagaccgggc ctcagggctg tgcccggggc aggcgggggc agcacgtgcctgtccttgag 3600 aactcgggac cttaagggtc tctgctctgt gaggcacagc aaggatccttctgtccagag 3660 atgaaagcag ctcctgcccc tcctctgacc tcttcctcct tcccaaatctcaaccaacaa 3720 ataggtgttt caaatctcat catcaaatct tcatccatcc acatgagaaagctt 3774 3 40 DNA Artificial Sequence Description of ArtificialSequence oligonucleotide hybridizing to 5′ region of CEA 3 ccctgtgatctccaggacag ctcagtctcc gtccaatctc 40 4 28 DNA Artificial SequenceDescription of Artificial Sequence oligonucleotide hybridizing to 5′region of CEA 4 gtttcctgag tgatgtctgt gtgcaatg 28 5 35 DNA ArtificialSequence Description of Artificial Sequence oligonucleotide primer 5cctggaactc aagcttgaat tctccacaga ggagg 35 6 7 DNA Artificial SequenceDescription of Artificial Sequence consensus sequence A1 from DNASequence 13-11 (1990). 6 tatawaw 7 7 15 DNA Artificial SequenceDescription of Artificial Sequence consensus sequence A2ca1t from DNASequence 13-11 (1990). 7 ttggcnnnnn ngcca 15 8 14 DNA ArtificialSequence Description of Artificial Sequence consensus sequence A4a1tfrom DNA Sequence 13-11 (1990). 8 rrrncchcac cctg 14 9 8 DNA ArtificialSequence Description of Artificial Sequence consensus sequence B2 fromDNA Sequence 13-11 (1990). 9 gtggwwwg 8 10 8 DNA Artificial SequenceDescription of Artificial Sequence consensus sequence B4 from DNASequence 13-11 (1990). 10 gsswgscc 8 11 10 DNA Artificial SequenceDescription of Artificial Sequence consensus sequence B12 from DNASequence 13-11 (1990). 11 ccwwwwwwgg 10 12 6 DNA Artificial SequenceDescription of Artificial Sequence consensus sequence B15 from DNASequence 13-11 (1990). 12 gaaagy 6 13 6 DNA Artificial SequenceDescription of Artificial Sequence consensus sequence B17 from DNASequence 13-11 (1990). 13 tcmytt 6 14 9 DNA Artificial SequenceDescription of Artificial Sequence consensus sequence B18 from DNASequence 13-11 (1990). 14 ancctctcy 9 15 8 DNA Artificial SequenceDescription of Artificial Sequence consensus sequence C5 from DNASequence 13-11 (1990). 15 gtgsggtg 8 16 8 DNA Artificial SequenceDescription of Artificial Sequence consensus sequence D9 from DNASequence 13-11 (1990). 16 rtgacgtr 8 17 12 DNA Artificial SequenceDescription of Artificial Sequence consensus sequence E5 from DNASequence 13-11 (1990). 17 accnnnnnng gt 12 18 6 DNA Artificial SequenceDescription of Artificial Sequence consensus sequence F2 from DNASequence 13-11 (1990). 18 tgrmcc 6 19 8 DNA Artificial SequenceDescription of Artificial Sequence consensus sequence F6 from DNASequence 13-11 (1990). 19 tcntactc 8 20 8 DNA Artificial SequenceDescription of Artificial Sequence consensus sequence F7 from DNASequence 13-11 (1990). 20 tgtttgct 8 21 5 DNA Artificial SequenceDescription of Artificial Sequence consensus sequence F9 from DNASequence 13-11 (1990). 21 tcact 5 22 9 DNA Artificial SequenceDescription of Artificial Sequence consensus sequence F10 from DNASequence 13-11 (1990). 22 wtstgggaw 9 23 8 DNA Artificial SequenceDescription of Artificial Sequence consensus sequence G2 from DNASequence 13-11 (1990). 23 aanccaaa 8 24 6 DNA Artificial SequenceDescription of Artificial Sequence consensus sequence G7 from DNASequence 13-11 (1990). 24 gataag 6 25 17 DNA Artificial SequenceDescription of Artificial Sequence consensus sequence H1 from DNASequence 13-11 (1990). 25 rnynncnngy ngktnyn 17

What is claimed is:
 1. A molecular chimera comprising a carcinoembryonicantigen (CEA) transcriptional regulatory sequence (TRS) and a DNAsequence operatively linked thereto encoding a heterologous protein,wherein the CEA TRS comprises: (a) a CEA promoter element; and (b) a CEAenhancer element comprising a nucleic acid sequence selected from (i)the sequence of FIG. 6 from about −14.4 kb to about −10.6 kb; and (ii)fragments of (i) that act as CEA enhancer elements.
 2. A molecularchimera according to claim 1 wherein the heterologous protein is anenzyme capable of catalyzing the production of an agent cytoxic orcytostatic to CEA+ cells.
 3. A molecular chimera according to claim 1 towherein the CEA TRS and the sequence encoding a heterologous protein arein an expression cassette.
 4. A retroviral shuttle vector comprising amolecular chimera according to claim
 1. 5. A molecular chimera accordingto claim 1 wherein the CEA promoter element comprises the nucleotidesequence of FIG. 6 from about −90b to about +69b.
 6. A molecular chimeraaccording to claim 1 wherein at least one of said CEA TRS elements is ininverse orientation.
 7. A molecular chimera according to claim 1containing multiple copies of said CEA promoter element.
 8. A molecularchimera according to claim 1 containing multiple copies of said enhancerelement.
 9. A molecular chimera according to claim 1 containing at leasttwo different enhancer elements.
 10. A molecular chimera according toclaim 1 where said DNA encodes an enzyme.
 11. The molecular chimeraaccording to claim 2 wherein the heterologous enzyme is cytosinedeaminase (CD).
 12. A molecular chimera according to claim 3 andadditionally comprising an appropriate polyadenylation sequence which islinked downstream in a 3′ position to said DNA encoding a heterologousprotein, and in proper orientation to the CEA TRS.
 13. The retroviralshuttle vector according to claim 4 comprising a DNA sequence comprisinga 5′ viral LTR sequence, a cis acting psi encapsidation sequence, themolecular chimera and a 3′ viral LTR sequence.
 14. A retroviral shuttlevector according claim 4 which is a SIN vector.
 15. An infective virioncomprising a retroviral shuttle vector according to claim 4, the vectorbeing encapsidated within viral proteins to create an artificial,infective, replication defective, retrovirus.
 16. A packaging cell linecomprising a retroviral shuttle vector according to claim
 4. 17. Theretroviral shuttle vector according to claim 13 based on Moloney murineleukaemia virus.
 18. A composition comprising the infective virionaccording to claim 15 together with a pharmaceutically acceptablecarrier.
 19. A composition comprising the packaging cell line accordingto claim 16 together with a pharmaceutically acceptable carrier.
 20. Amethod of targeting expression of a heterologous protein to CEA+ cellscomprising contacting a population of cells that comprises CEA+ cellswith said molecular chimera according to claim 1 under conditions suchthat said molecular chimera enters said cells and expression of saidheterologous protein is effected in said CEA+ cells.
 21. A molecularchimera comprising a carcinoembryonic antigen (CEA) transcriptionalregulatory sequence (TRS) and a DNA sequence operatively linked theretoencoding a heterologous enzyme, wherein the CEA TRS comprises: a) a CEApromoter element comprising the sequence of from about −90b to about+69b of FIG. 6; and b) an enhancer element, said enhancer elementcomprising a nucleic acid sequence selected from (i) the sequence ofFIG. 6 from about −14.4 kb to about −10.6 kb, and (ii) fragments thereofthat act as CEA enhancer elements.
 22. A molecular chimera according toclaim 21 wherein the heterologous enzyme is capable of catalyzing theproduction of an agent cytotoxic or cytostatic to CEA+ cells.
 23. Amolecular chimera according to claim 21 wherein the heterologous enzymeis cytosine deaminase (CD).
 24. A retroviral shuttle vector comprisingthe molecular chimera according to claim
 21. 25. A method of targetingexpression of a heterologous enzyme to CEA+ cells comprising contactinga population of cells that comprises CEA+ cells with said molecularchimera according to claim 21 under conditions such that said molecularchimera enters said cells and expression of said heterologous enzymeprotein is effected in said CEA+ cells.